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ABSTRACT 

China has the largest number of online users in the world and 
about 20% internet users are from China. This is a huge, as 
well as a mysterious, market for IT industry due to various 
reasons such as culture difference. Twitter is the largest 
microblogging service in the world and Tencent Weibo is 
one of the largest microblogging services in China. Employ 
the two data sets as a source in our study, we try to unveil 
the unique behaviors of Chinese users. 

We have collected the entire Tencent Weibo from 10th, 
Oct, 2011 to 5th, Jan, 2012 and obtained 320 million user 
profiles, 5.15 billion user actions. We study Tencent Weibo 
from both macro and micro levels. From the macro level, 
Tencent users are more active on forwarding messages, but 
with less reciprocal relationships than Twitter users, their 
topic preferences are very different from Twitter users from 
both content and time consuming; besides, information can 
be diffused more efficient in Tencent Weibo. From the mi- 
cro level, we mainly evaluate users' social influence from two 
indexes: "Forward" and "Follower", we study how users' ac- 
tions will contribute to their social influences, and further 
identify unique features of Tencent users. According to our 
studies, Tencent users' actions are more personalized and di- 
versity, and the influential users play a more important part 
in the whole networks. 

Based on the above analysis, we design a graphical model 
for predicting users' forwarding behaviors. Our experimen- 
tal results on the large Tencent Weibo data validate the 
correctness of the discoveries and the effectiveness of the 
proposed model. To the best of our knowledge, this work is 
the first quantitative study on the entire Tencentsphere and 
information diffusion on it. 
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1. INTRODUCTION 

Tencent Weibo, the biggest Micro-Blogging website not 
only in China, but also in the world, is considered power- 
ful tool to change people's traditional communication styles 
in China. The basic functions of Tencent is similar with 
Twitter, which include "Follow", "Create tweets", "Retweet", 
"Reply", "Mention", "hastag" and et al. In Tencent Weibo, 
we call a tweet as weibo, "Retweet" as "Forward","hastag' 
as "Topic". Different with Twitter, Tencent Weibo provides 
more functions to help online users to build their own per- 
sonalized media center for both obtaining and diffusing in- 
formation; the new functions include "Mails", "Comments", 
"figures and videos sharing" and etc. "Comments" provide 
specific space for users to discuss on a certain topic; "Mails" 
could provide mail communications between two users. All 
the assignments above could help users to participant in 
group discussions in a more easier way and could better ex- 
press their opinions, users could also spread interesting in- 
formation more flexible. We call all users' behaviors through 
those functions as users' actions, the actions of Tencent users 
result in a highly sensitive information system, which means 
that useful information will spread rapidly and cover millions 
of nodes level as the formal information is announced even 
before. 

Researches of large-scaled data analyses have been widely 
studied and applied in many famous micro-blogs, such as 
Twitter and Facebook. Those researches take deep insight 
into topological features and dynamic trends of evolution, 
and their research results are an important complement to 
current social network theory, but they seldom consider spe- 
cific social networks that are embedded in different culture, 
economics and political background. In this study, we focus 
on Tencent Weibo, the most famous of Chinese micro-blog, 
which has more than 320 millions users, there is no previ- 
ous study of such a social network, which consist of such a 
huge amount of users from the same country. What these 
users are talking about every day, how they construct their 
on-line structures, and how they share and spread informa- 
tion is the essential information we hope to identify. The 
main purpose of this paper is thus to study the statisti- 



cal features of Tencent Weibo, which could help us better 
understand its unique capabilities and components. A set 
of experiments are designed to make a systematic research 
from both macro and micro levels; from the macro level, 
we investigate whether Tencent Weibo is a new media cen- 
ter or a communication network, and from the micro level, 
we investigate the behavioral features of Tencent users and 
compare it with Twitter. We apply paralleled Topic Model, 
People-Rank, CRFs(Conditional Random Fields) and other 
Map-Reduce based statistical algorithms to support our re- 
searches. Our main contributions are included as follow: 

• We analyze users' actions in Tencent weibo, and find 
that compared with Twitter, Tencent weibo is more 
likely to be a personalized media center, and the fea- 
tures of users' actions are very different with that of 
Twitter. 

• The hot topics talked in Tencent weibo are very differ- 
ent with that of Twitter. 

• Compared with twitter, Tencent has a more complex 
network and more active users. 

• Similar with Twitter, the number of "Forwards" and 
"Followers" is related with users' actions in Tencent, 
and the relationships are only effective under a certain 
extent. 

• To verify the practical value of our discovers in Ten- 
cent Weibo, we propose a predictive model to illustrate 
the applications of forward analysis, the results show 
that statistical analysis of Tencent weibo could be ef- 
fectively applied to direct the design of weibo services. 

This paper is organized as follows: Section 2 introduces 
the related work; Section 3 describes the data collection; 
Section 4 makes an analysis from the macro level; Section 5 
makes an analysis from the micro level; Section 6 propose 
an application research; and Section 7 is the conclusion. 

2. RELATED WORK 

Micro-Blogs, as a new style of online social network and 
social media, has recently attracted more and more atten- 
tion. For example. Twitter, as one of the world's biggest 
micro-blogs websites, has been widely studied. Users' behav- 
iors and social relations are always the hot research topics, 
Perra et. al took Twitter as experiment objects, constructed 
an activity- driven model to describe the structure features 
of the highly dynamic network [17]; Jie et. al studied on 
how different social ties will influence the propagation of in- 
formation in Twitter [20]; Java et. al investigated how users 
communicate with each other and generate communities in 
Twitter [11]; Krishnamurthy, et. al used follower-followee 
relationships to study users' characteristics [13]. The influ- 
ence of Twitter users is another attractive topics, Wu et. al 
defined and applied new features to observe the behaviors 
of different type of infiuential users in Twitter [?]; Cha et. 
al measure infiuence in Twitter from three aspects: num- 
ber of "Followers", "Forwards" and "Mentions" [6]; Some re- 
searchers focused on using the analysis results for real appli- 
cations, Hopcroft et. al studied the features of "Reciprocal" 
in Twitter and made prediction on users' behaviors [9]; Peng 
et. al studied the features of "Retweet" in Twitter and ap- 
plied CRF to make "tweets" recommendations [16]; Zhao et. 



al studied the potential reasons for users' behaviors in Twit- 
ter and make predictions [21]; Other interesting researches 
are mainly focused on what users are talking in Twitter, for 
example, HP Labs made primary comparison between Twit- 
ter and Sina Weibo to analyze their daily topics; Banerjee et. 
al detected topic interests of Twitter users by analyzing their 
posted tweets [1] [2]; Other researches revealed Twitter's so- 
cial sensor and prediction, such as [3] [10] [19] and [18]. 
Several researches have also systematically studied Twitter, 
Kwak et. al applied statistical methods to analyze its fea- 
tures from both social network and social media angles [14]. 
Similar to Kwak's work, our research also marks the first 
look at the entire micro-blogs components of Tencent, and 
makes comparison with Twitter. This is meaningful for cur- 
rent researches of social networks, because few studies have 
been done to study specific micro-blogs with different cul- 
tural, economics and political backgrounds. The researches 
of Tencent Weibo 's more than 320 million Chinese users 
may provide important information that complement exist- 
ing studies. A systematic attempt at characterizing unique 
features of Tencent Weibo is designed, the experiment takes 
deep insight into users' online actions, information propa- 
gation and dynamic content analysis by making comparison 
analysis with Twitter. 

3. DATA COLLECTION 

We collect the entire data set of Tencent Weibo from 7 
Oct 2011 to 5 Jan 2012 day by day, wherein each day con- 
tains 40 million 150 million users' actions; we use a servers 
cluster with 36 machines to store all data sets in HDFS sys- 
tem, wherein each machine contains 15 Intel(R) Xeon(R) 
processors (2.13GHZ) and 60G memory. In order to obtain 
a high quality experiment data set, we also make rules to 
delete spam users from Tencent Weibo. The summarization 
of the final experiment data set is listed in Table 1. 

The data collection is mainly based on users' all actions 
and their social relations in Tencent Weibo. We build pro- 
files for each user as UserX—{Uname, Followers, Followees, 
Weibos}, where Uname is the ID of each user, Followers is 
the collection of each user's followers, Followees is the col- 
lection of each user's followees, Weibos={id, type, content, 
parent, root, time, location} records all weibos, which cur- 
rent user has created. We assign a unique id for each weibo; 
type includes "Original Create", "Forward", "Reply", "Com- 
ment", "Mail", "Mention"; content contains texts, figures and 
videos; time and location are to record when and where cur- 
rent weibo was generated; assume we have two weibos A and 
B, if 5 is a "Reply" or "Comment" or "Forward" of A, then 
we call A is 5's parent; if another weibo C is parent of A 
and C does not have parent, then C is root of B. Besides, 
we also collected a small data of all actions and relation- 
ships of 110 thousands Twitter users from 12 OCT to 23 
DEC by using Twitter API, the data set includes 293,386 
following relationships, 9,376,500 original created tweets and 
13,277,043 retweets. The data set is used as a aid for some 
comparison between Tencent and Twitter. 

4. MACRO LEVEL ANALYSIS 

4.1 Personalized Media or Social Circles 

In this section, we investigate users' main preferences in 
Tencent Weibo. These preferences can be described as whether 



Table 1: The Summarization of Tencent Micro-Blogging from 2011 Oct to 2012 Jan 



Items 


Users 


Original 


Forwards 


Replies 


Comments 


Mails 


Mention 


Total Number 


326,497,021 


3,607,924,594 


1,026,243,542 


43,658,122 


299,354,146 


174,440,376 


2,347,927 



users like to build a personalized media center or a social 
circle in Tencent Weibo. The evaluations are mainly based 
on users' actions, "Original Create" and "Forward" can be 
considered as creating and sharing information from a per- 
sonalized media center, while "Reply", "Mention", "Mail" 
and "Comment" can be considered as social communications. 
"Reciprocity" is also an important index to measure the will- 
ingness of users to generate social circles. Thus we evaluate 
all users' actions mainly from three aspects: Power Law 
Growth, Exponent Distribution and Reciprocity. 

4.1.1 Power Law Growth of Users ' Actions 

Growth analysis is applied to observe the growth rate 
of users' actions; the analysis could help to better under- 
stand users' actions in Tencent Weibo. Growth analysis are 
widely used in many websites such as Del.icio.us, Flickr and 
Youtube [15]. We design similar experiments to evaluate 
Tencent Weibo, the results are exhibited in Figure 1. 

In Figure 1, three sub-figures illustrate the increased fea- 
tures of three main users' actions: "Original Create", "For- 
ward" and "Reply") along the intrinsic time dt. Intrinsic 
time dt is defined as the increased number of total users' 
actions during a fixed time interval. In this experiment, we 
assign the time interval as an hour, all the increase actions 
along the intrinsic time closely follow a power-law (straight 
line in a log-log plot) across the entire time line. The dash 
black lines are provided as an aid for observation, with in- 
crease exponent gamma as 0.9826, 0.9189 and 0.7848 respec- 
tively, which means that users in Tencent Weibo prefer to 
express their opinions by creating new weibos than by com- 
municating with others. According to the statistical data 
from sysomos company ^, the exponent of "Original Cre- 
ate", "Forward" and "Reply" in Twitter is around 0.9224, 
0.8387, 0.9060 respectively. The result shows that Tencent 
users have higher activities to create and share new mes- 
sages than Twitter, while Twitter users tend to reply rather 
than forward messages. While compared with other social 
networks, such as Del.icio.us [5], Flickr and Youtube [15], 
the exponent of which is around 0.8. Users tend to propa- 
gate existing weibos firstly by using "Forward" actions, while 
"Reply" actions gain the lowest scores. We also calculate the 
exponent of "Comment", "Mention" and "Mail", the results 
are bigger than "Reply" and smaller than "Forward". About 
28% of original created weibos are related with daily sen- 
timent. About 70% of original created weibos concerned 
to a certain topic, such as entertainment, economics, poli- 
tics, science and etc, among which entertainment gains the 
highest popularity. About 64% of original created weibos 
contain pictures or videos; different with Tencent, Twitter 
users prefer events with hot economic, politic and science 
topics. The small Figures embedded in each Figure show 
the actions increasing along the real time, all three actions 
exhibit linear increases. According to this analysis, we can 
summarize that Tencent users prefer to create and share in- 
formation rather than to communicate with others, while 
the communication actions take up a total of around 30 % 

^ http : / / www. sysomos . com / insidetwitt er / 



of all the actions. 

4.1.2 Exponent Distribution of Users ' Actions 

To further confirm our conclusions, we design the follow- 
ing experiment to investigate how users from different active 
levels organize their actions. We select the top 100,000 ac- 
tive users, 75% 100,000 ranked normal users based on their 
total number of actions, and then calculate their exponent 
distributions for different type of actions. The formula for 
calculating activity exponent is as follow: 

Exponenttype(useri) = - — . (1) 

lOgyh max) 

In Equation 1, type means different action types; F^ax 
means the final number of current type of actions of user 
usevi. By applying that equation, we draw the Exponent 
distributions of all actions (Original, Reply, Forward and 
Mail) of all selected users. 

In Figure 2, the left sub figure can be seen as the exponent 
distributions of active users and the right one is the exponent 
distributions of normal users (75% rank). For the active users 
on the left sub figure, around 40 % to 50 % users have a high 
preference for one main action (such as users only create 
"Original" weibos, or users only forward others), and the 
remaining users prefer to assign a different proportion for 
their different actions, while most of them prefer "Original" 
over others (the exponent value between 0.6 and 0.8). We 
could also observe that "Original" and "Forward" have an 
increase trend; "Mail", "Reply", "Mention" and "Comment" 
have a high consistency of decrease trends. For normal users 
on the right sub figure, similar results could also be observed, 
it seems that normal users have higher activities for creating 
new weibos, lower activities for social communication. This 
phenomenon illustrates that Tencent users tend to build a 
personalized media center rather than create social circles; 
while highly active users seem to have higher activities to 
communicate with others, and normal users are more likely 
to create and share information with others. 

4.1.3 Reciprocity Analysis 

In this section, we analyze how Tencent users generate 
their Reciprocity relationships during the three months of 
data collection. We deploy the X axis as the number of 
"Reciprocity Friends", while the Y axis serves as the num- 
ber of users, who generate the same number of "Reciprocity 
Friends" during those three months, and draw the reciprocity 
distributions in Figure 3. 

As can be seen in Figure 3, the Reciprocity satisfies power- 
law distributions, with the exponent as -0.5261. The average 
Reciprocity number is 2.7342, which means that a user may 
generate an average of 2.7342 reciprocity relationships with 
other users. The total percentage for the three months is 
about 0.7% compared with the total number of "Follows" 
actions, which shows that after almost two years of develop- 
ment (Tencent Weibo was formally online on 1st April 2010), 
the increase of Reciprocity is quite small, compared with the 
increase of "Following" relationship in Figure 10, where the 
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Figure 2: Exponent Distribution of Users' Behaviors 
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average following actions is 64. Based on this observation, 
we find users are more likely to follow new users who they are 
more interested in, but not their followers. One main reason 
for this is that Tencent Weibo is based on user groups of the 
Tencent QQ(the biggest online chatting services in China), 
which will bring social relationship of QQ users to Tencent 
Weibo once users accept services from Weibo; in another as- 
pect, users could communicate with each other by adopting 



other more efficient Tencent tools such as Tencent QQ. 

4.2 Information Diffusion 

4.2. 1 Distribution of ''Forwarding 

We draw "Forward" distributions in Figure 4, where the X 
axis represents the number of "Forwards", while the Y axis 
represents the number of weibos that have a common num- 
ber of "Forwards". As seen in Figure 4, the curve satisfies 
power-law distributions with the exponent -1.7415, which 
is bigger than that of Del.ici.ous [5] at -3.5, Youtube and 
Flickr [15], which are -3.5 and -8.2 respectively (they count 
how many tags are generated for one certain resource) . The 
mean value is 10.0304, which means that for each weibo in 
Tencent, the average number of forwards is around 10. This 
value is also bigger than Del.icio.us, Youtube and Flickr. 
As for Twitter, according to our Twitter data set, the av- 
erage value of retweets is about 2.3609, which is far more 
smaller than Tencent; besides, only 6% tweets in Twitter 
are retweeted, while 16.7% weibos in Tencent are forwarded, 
this result show that information in Tencent can reach up 
to a wider range than that in Twitter. 

Contrasting to the high average forwarding value, 97% 
weibos are forwarded less than 10 times, which means Ten- 
cent users have greater enthusiasm for popular news, and 
thus a small number of weibos have been forwarded by amaz- 
ing number of times. This phenomenon is similar with Twit- 
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Figure 4: The Distribution of Forward Behaviors 



ter. 

4.2.2 Depth of "Forwarding " 

We draw the depth distribution of "Forward" propagation 
in Figure 5. As seen in this Figure, 96% Tencent weibos 
have never been forwarded, 95% forwarded weibos have been 
forwarded less than 10 times, while for the rest of weibos, 
the depth distribution satisfies power-law distribution with 
a decrease exponent as -2.8599, which is smaller than Twit- 
ter, which is -1.5114 [14]. The average "Forwards" depth is 
1.2898, which is similar with that of Twitter. The deepest 
length is 69, which is far bigger than that of Twitter, which 
is less than 20. Only 75 weibos reach that level. There are 
two possible reasons for that, first, The follow network in 
Tencent may be more complex than that of Twitter; both 
micro-blogs have similar number of users at the end of 2011, 
while the average "Following" number of Tencent is around 
64, which is higher than that of Twitter, which is less than 
50; the high activities of Tencent users to follow others will 
cause higher complexity, second, it is more general in Ten- 
cent that there exists many online business groups to heat 
up certain events with sensitive factors (most of which are 
related with public impartial, moral and etc) to attract pub- 
lic's eyes, so that they can make profit from it; one main 
method is to keep forwarding the related information in a 
very short time period, which would also increase the depth 
of Forwards. 
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Figure 5: The Relationship between "Follows" and 
"Forwards" 



4.2.3 Time of "Forwarding " 

As seen in Figure 6, about 35 % Forward actions happened 
in one hour, 45 % happened in one day, 14% happened in one 
week and 4.3% happened after one month. The distribution 
satisfies power- law distributions, where the exponent is - 
1.9058 and the average forwarding time is 95,875 seconds, 
which is about 26 hours. 
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Figure 6: The Relationship between "Forward" 
Propagation and "Time" 

Compared with Twitter, the speed of information spread 
in Tencent is a little slower in one hour (10% slower), while 
faster in one day (8% faster) and one week (12% faster). 
The reason for this is that the networks generated by Ten- 
cent users are more complex than that of Twitters, which 
make the spreading speed less efficient than that of Twitter; 
in another aspect, Tencent users have a higher rate of propa- 
gating information than Twitter users, thus after a time pe- 
riod, information in Tencent often spreads to a wider range 
than Twitter. 

4.3 Topical Analysis 

4.3.1 Ranking Topics and Tencent Users 

In Tencent Weibo, users can also create topics, which is 
similar to hastag in Twitter. Those created topics help us 
better understand the daily interests of Tencent Users. We 
first collect all created topics for each day (we only consider 
topics with participants), where we obtain around 50,000 of 
the most popular topics. We then select the top 20 ranked 
weibos for each topic as its content, and run the data by 
applying parallel LDA ^ model to make clustering on all 
topics (we assign the number of Topics as 50, and exhibit 
the top 10 ranked topics). The topic distribution of the top 
10 days (From 7 Oct 2011 to 17 Oct 2011), second 10 days 
and third 10 days can be observed in Figure 7: 

As seen in Figure 7, different with Twitter, the weight 
of entertainment (such as Online Game, QQ Space, Picture 
and Video Share) is highest in Tencent Weibo; besides, a 
certain amount of users are constantly interested in Sports, 
Cars and Economics, and keep creating related weibos at all 
times. One interesting phenomenon is that users are more 
interested in suddenly happening events, most of which are 
related to social affairs and global events. For example, in 
Time Period 2, a transportation accident and a homicide 
event in the Golden Triangle brought up a huge wave in 

^http:/ /code. google. com/p/plda/ 
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Figure 7: Topic Distributions for Tencent Weibo From Different Time Periods 



Tencent Weibo, and the rank of related key words got into 
top 30 only in 10 days. This means that a huge amount of 
users in Tencent Weibo are more interested in specific topics, 
and they can share and spread certain information quickly 
to rapidly propagate it to a wide range. 

We also rank all Tencent users by applying a parallel Peo- 
pleRank algorithm, which is an improvement version based 
on parallel PageRank [12]. We compare the results with 
Twitter as seen in Table 2. 

In Table 2, the top 20 ranked Tencent users are mainly in 
entertainment, sports, and are famous hosts, actors, sports 
and the Tencent Official Agency itself; a famous writer and 
economist are also included in the top 20 ranked Tencent 
users; while in Twitter [14], the role of most of the top 
ranked users are similar to Tencent, while the first ranked 
is BarackObama. The main difference is that in Tencent 
Weibo, the number of top ranked official Agencies (nine 
Agencies) are bigger than that of Twitter (three Agencies) , 
which means that traditional media still plays an important 
role in broadcasting information and providing services in 
Tencent Weibo. In another aspect, different with Twitter, 
user accounts with content related with jokes, literatures and 
fashions take a high proportion(about nine out of top twenty 
ranked accounts), this phenomenon is mainly based on dif- 
ferent culture back ground, while in China, people often like 
to use poetry, joke to express their sentiment or dissatisfies; 
for young people in China, they have high enthusiasm to 
pursue fashions. 

4.3.2 Topical Trending Analysis 

First, we investigate what kind of information could be 
"Popular" and their popular patterns in Tencent Weibo. We 
find four different popular patterns from all detected topics, 
where sampled topics for each pattern are drawn in Figure 8. 
Each time period contains 10 days and the influence weight 
is the normalization of their weibos. 

Based on Crane and Sornette's theory [7], popular pat- 
terns can be divided into four categories, which can be re- 
sponses to the four sampled figures. The first is mainly de- 
termined by exogenous, which can be seen as a sudden out- 
break, rapidly spread throughout the whole Tencent Weibo 
in a very short time, and then decreases to a low level 
for a long time period; "School Bus" is a serious accident, 
which caused public's attentions towards Government man- 
agement for quite a long time. The second is determined by 
both exogenous and endogenous, where exogenous is seen 
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Figure 8: Popular Patterns in Tencent Weibo 



to have a big influence on endogenous. For example, the 
event of Gaddafi lasted for almost one year, and users still 
kept a certain active degree to talk about this topic, when 
the "Death of Gaddafi" ourbreak, the popularity reached an 
amazing height and continued for several days, after that, 
the popularity of this topic decreased rapidly and kept in a 
lower level. The essence of this pattern is that if users' expec- 
tations are satisfied by external factors, then their concerns 
towards certain event will become lower. The third pattern 
is the circle of the second one, especially in stock market, 
users' expectation will be satisfied again and again by ex- 
ternal factors and the Fluctuation will be generated. The 
fourth pattern exhibits a process of life circle, this pattern is 
especially general for online games, products and etc, which 
can attract and keep a huge amount of users to receive their 
services for quite a time period, and after that, the influence 
degree will decrease slowly. As can be seen in Sub-Figure 
(d), the game Sky Scraper is very popular during October 
and December, the game reached its top value when it came 
to the end of November, after that, Tencent users exhibited 
less and less interests in the game. 



Table 2: Top 20 ranked users by PeopleRank from both Tencent and Twitter 



Rank 


Tencent Ranking 


Twitter Ranking 


ID 


Name 


Remark 


ID 


Name 


Remark 


1 


1323005700 


High Quality Joke 


Services 


aplusk 


ashton 


actor 


2 


30818627 


We like Jokes 


Services 


obama 


Obama 


president 


3 


88886666 


Super QQ 


Official Weibo 


CNNBrk 


CNN Breaking News 


news 


4 


970144221 


DNF 


Official Online Game 


TheEllenShow 


Ellen DeCeneres 


Show host 


5 


2360330217 


Mood Helper 


Official Weibo 


britneyspears 


Britney Spears 


musician 


6 


19990210 


QQ Product Team 


Official Weibo 


Oprah 


Oprah Winfrey 


show host 


7 


622004906 


Na Xie 


Host 


THE REAL SHAQ 


THE REAL SHAQ 


Sports Star 


8 


611986579 


Jiong He 


Host 


Johncmayer 


John Mayer 


Musician 


9 


1379986183 


Beauty Story 


Official Weibo 


twitter 


Twitter 


Twitter Weibo 


10 


2367520831 


Libo Zhou 


Show Host 


RyanSeacrest 


Ryan Seacrest 


Show Host 


11 


302536308 


Text that touch your soul 


Lancearmstrong 


Lance Armstrong 


Sports Star 




12 


3756217 


Hot Jokes 


Services 


JimmyFallon 


Jimmy Fallon 


Actor 


13 


2581425470 


QQ Net 


Services 


lamdiddy 


lamdiddy 


Musician 


14 


611985987 


NBA 


Sports 


muskutcher 


Demi Moore 


Actress 


15 


305974257 


Weian Qiao 


Writer 


PerezHilton 


Perez Hilton 


Power Blogger 


16 


345005673 


Beauty Sentences 


Services 


nytimes 


The New York Times 


News 


17 


622006051 


Zhiqiang Ren 


Finance 


mileycyrus 


Mile Ycyrus 


Actress and Musician 


18 


343551325 


Joke Base 


Services 


stephenfry 


Stephen Fry 


Actor 


19 


772937290 


Fashion and Constellation 


Services 


The Onion 


The Onion 


News 


20 


99912345 


Tencent News 


News 


KimKardashian 


Kim Kardashian 


model 
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Figure 9: The Age Trending of Topics in Tencent 
Weibo 



In another aspect, we observe the age of trending topics 
(The Freshness of Topics) from Tencent and compare it with 
Twitter and Google [14]. The results of Tencent seen in 
Figure 9 exhibits the proportion of old and fresh information 
at any time slice. 

In Tencent, the proportion of older topics (more than one 
month) is around 40 %; in Google, the proportion of older 
keywords is less than 10%, while in Twitter, the propor- 
tion of older tweets is less than 20% [14]. The phenomenon 
shows that many older topics in Tencent Weibo are consis- 
tently popular and discussed among Tencent users, where 
those topics provide steady platforms for users to know and 
communicate with each other; on the other hand, the pro- 
portion of new topics in each day is also around 40%. This 
phenomenon thus illustrate different behavior patterns be- 
tween Tencent and Twitter, in Tencent, many celebrities like 
to create hastags such as real estate, situation of middle east 
as a micro forum, and a lot of other users would like to join 
in it for a long period of time. 

5. MICRO LEVEL ANALYSIS 

In this section, we focus on the relationship between users' 
actions and their social influence from micro level, and com- 
pare the result with Twitter. In our researches, a user's influ- 
ence is mainly measured from two aspects: "Followers" and 
"Followees", "Forward" actions. To better evaluate users' in- 



fluence from these two aspects, a batch of experiments are 
designed for each question, where the results summary is 
presented in the following sections. 

5.1 Analyze Influence from "FoUow" Actions 

In this section, we identify three subsections of the "Fol- 
low" actions: the first subsection is to introduce the statisti- 
cal features of "Followees" and "Followers", which could also 
be used to describe the influence distribution of Tencent 
users; the second subsection is to detect how the number 
of "Followees" will influence the number of "Followers"; the 
third subsection is to detect how other actions will influence 
the number of "Followers". 

5.1.1 Distribution of ''Followees " and ''Followers " 

We draw the distribution of "Followees" and "Followers" in 
Figure 10, where both satisfy power- law distributions with 
steep power-law tails (the exponent of "Followees" is -3.0096 
and the exponent of "Followers" is -3.6849, which are smaller 
than Twitter (all are around -2.276) [14] and similar to De- 
licious (around -3.5) [5]). The average "Followees" is 64, 
which is bigger than that of Twitter (around 42 according 
to our collected Twitter data) and the average "Followers" is 
155, which is also bigger than that of Twitter (around 136 
according to our collected Twitter data). The summarized 
data illustrates that Tencent users have higher activities to 
follow others, while they seem to follow a wider range than 
Twitter. This phenomenon also implies that Tencent may 
have a more complex network than Twitter. 

As seen in Figure 10, the curve for "Followees" has two 
significant glitches, where one is around 15 and the other is 
around 40. The reason for this is that Tencent Weibo is de- 
veloped based on Tencent QQ (the biggest online chatting 
services in China), so when users entered Tencent Weibo, 
they also brought some of their original relationships into 
Tencent Weibo, where the average amount is around 40; 
otherwise, if they did not bring any relationships, Tencent 
Weibo will recommend them 10 to 20 users according to 
their history records from Tencent QQ. Besides, more than 
30 Tencent users obtained bigger than 10^ followers during 
the three months studies, similar to Twitter, those users are 
mainly celebrities (entertainment stars, hosts and economists), 
government agencies, famous companies and big media or 
entertainment agencies (such as Tencent News, on line game, 
joke center, beauty and fashion). These high ranked celebri- 




Figure 10: Distributions of "Followings" and "Followers" 



ties often have a high influence, and their messages will often 
be forwarded by tens of thousands of their followers, some of 
those celebrities, government agencies and companies would 
also like to follow some ordinary users, those Reciprocity be- 
haviors often significantly improve the influence of ordinary 
users. 

5.7.2 Relations of ''Followees" and "Followers'' 

In order to find the correlation between "FoUowees" and 
"Followers" of Tencent users, we plot the number of "Fol- 
lowees" in X axis, the user has followed against the number 
of "Followers" in Y axis, the user has obtained. As seen in 
Figure 11. The experiment data covers the whole range of 
users, which is about 320 millions. 




1 10 100 1000 2000 

# of Followees 



Figure 11: Distribution of Followees- Followers 

As seen in Figure 11, the color points (total 800 x 800 
color points) in the heat-map mean identifies the number 
"A/"" of users who have followed "A" (the number on X axes) 
users and have "y" (the number on Y axes) "followers", we 
assign the points with maximum number of users as 100%, 
which is about 6,067,947. In order to clearly observe the 
distribution of users' "Followees" and "Followers", we make 
double log for all A^; then all the users can be classified into 
five significant groups (^1,^2,^3,^4,^5). Around 97% of 
users are in Ai area, whose "followees" are from 5-400 and 
"followers" are from 10-400; around 2.6% users are in ^2,^3 
and Aa areas; less than 0.4% users are in area. The 



vertical direction represents users' influences, while the hor- 
izontal direction represents users' activities, a bigger value 
of Y therefore means that the user has a higher influence in 
obtaining more followers, while a bigger value of X means 
that the user has a higher activity in following others. The 
number of users from Ai to follows an exponent decrease 
with exponent around -6.3. The size of each area represents 
the diversity of users' followee- follower relationships, where 
for area A5, though the total number is less than 0.4%, users' 
"Followee- Follower" relations covers a wide range. The curve 
in Figure 1 1 shows the increase of median value of each num- 
ber of "Followees", different with Twitter, the trend from 
to 1,500 exhibits a linear monotonous increase with the slope 
around 0.45, while when the number of a user's followees ex- 
ceeds 1,500, he/she will have a higher probability to obtain 
more than 10,0000 followers, one main reason is that 75% of 
those users are official agencies, which pay more attention 
to the feedback of users' experiences for their services. 

5.1.3 Relations of ''Followers " and Other Actions 

In this section, we investigate the relationships between 
the number of users' "Followers" and their total number of 
actions. As seen in Figure 12, the color nodes with maxi- 
mum number of users are assigned as 100%, which is about 
2,750,930. Five significant areas could also be observed, 
where Ai takes up 95% of Tencent users, the number of 
users from Ai to follows an exponent decrease with the 
value of exponent around -6.8. Different than Figure 11, the 
same number of actions often cover a wider range of "Fol- 
lowers", especially for the area A^ and A4. This means the 
diversity of users' actions has an influence on their number 
of followers; for example, assume two users have the same 
number of actions, where one may have far more followers 
than the other, because his contents are more interesting, or 
he has connected with more influential users than the other. 
For the majority of users, high quality content may help 
contribute 100 to 200 more followers than others. 

In both Figure 11 and 12, The curves of median value 
keep a linear increase from Ai to A4, while they scatter in 
area A5. This phenomenon shows that users' actions could 
help increase the number of followers to a certain extent, 
where 99% users are located in area Ai to A4. Very few 
users could exceed that extent, while for those users in A5, 




Figure 12: The Distribution of Followers- Actions 
Relationship 



it significantly shows that users' actions have no influence 
on the number of their followers. This phenomenon can be 
widely observed in Tencent Weibo, for example, many Ten- 
cent users say that they can not increase their followers any 
more by trying different methods. For a famous person, 
however, some of their casual messages could cause tens of 
thousands of for ward, comment in less than an hour. This 
phenomenon can be explained as for a potential friends cir- 
cle for each users, the scale of the circle is mainly based on 
the inner attributes of the user; these inner attributes will 
determine how many other users will be interested in current 
users. For example, a scientist can only attract those who 
are interested in his/her research domain, while a movie star 
has more common attributes that may attract more follow- 
ers. 

Similar results can also be observed in Twitter [14], but 
they exhibit different slope values. For example, the slope 
between "Followees" and "Followers" is around 1.24 in Twit- 
ter, while it is up to 3.21 in Tencent; the slope between 
"Followers" and "Actions" is around 0.45 in Twitter, while 
it is about 0.16 in Tencent. The phenomenon shows that in 
Twitter, users' number of actions will have a more signifi- 
cant effectiveness on the number of "Followers" than that of 
Tencent. 

5.2 Analyze Influence from "Forward'' Actions 

In this section, we analyze users' "Forward" actions from 
two aspects, the first is to analyze relations between "For- 
ward" and "Follow"; the second is to analyze relations be- 
tween "Forward" and other actions. 

5.2.7 Relations of "Forward'' and ''Follow'' 

Previous studies [14] have analyzed the distribution of 
"Retweets" behaviors of users' followees and followers. Based 
on their researches, we build up a similar experiment to an- 
alyze users' "Forwards" behaviors in Tencent Weibo. First, 
assume user i has ki followers and k2 followees, where to 
clearly illustrate the problem, we define /ci = A:2 = /c, after 
which a formula based on [14] is introduced as follows: 



Y{i) = 



-r 



(2) 



is a follower of Ui), or the number of times Ui forwards Uj^s 
weibo(uj is a followee of Ui). The formula could measure the 
distribution of Ui^s followers' "Forwards" behaviors towards 
him and Ui^s "Forwards" behaviors towards his followees. 
The results are seen in Figure 13: 

In Figure 13, "In Degree" means all the "Forwards" ac- 
tions of user Ui 's followers, while "Out Degree" means all 
the "Forwards" actions of user Ui towards all his followees. 
The increase exponent of dash pitched line for "In Degree" 
is 0.9584(bigger than Twitter, which is 0.801) and 0.8828 
for "Out Degree" (smaller than Twitter, which is 0.892). Ac- 
cording to Kwak's analysis, the more closer to the dash line, 
users' most of forwards will have a higher probability occurs 
within a subset of their followers; the more closer to X axis, 
users will obtain a more even distribution of their "Forwards" 
actions. The exponents show that compared with Twitter, 
some celebrities or agencies have received more attentions 
and have more influence in Tencent; while for users, their 
focus on a subset of their followees are not as strong as that 
in Twitter. The median curve in two sub-figures also show 
that Tencent users would like to forward others' weibo in a 
more wider range. 

5.2.2 Relations of ''Forward" and Other Actions 

In this section, we would like to investigate whether the 
influential relationship of two users can be measured. We 
take "Forward" as the main index to evaluate the influence 
between two users Xi and A2, if Xi has a high probability 
to forward A2's weibos, we call that X2 has a high influence 
on Xi. We mainly consider three features: The number of 
replies, comments and mails between Xi and A2. We take 
the three features as X-axis, the forward probability between 
random users pairs as Y-axis, and draw Figure 14. 

In Figure 14, the linear parts of each sub-figure take up 
99% users (their "Reply" or "Mail" actions are less than 20 
and 45 times respectively), which mean that we could di- 
rectly summarize the relation patterns from the linear parts. 
As can be seen, "Replies" and "Mails" have positive correla- 
tions with "Forward" probabilities, but "Comments" do not 
have significant relations. The reason is that users' com- 
ments have randomness, because they often comment on a 
certain weibo and do not care who provides it. 

6. APPLICATIONS 

The analysis above have its practical value in reality, es- 
pecially for behaviors prediction. In this section. We take 
"Forward" anlysis in the previous section as an example, and 
propose a Conditional Random Field (CRF) based prediction 
model to verify the application values of our analysis, we 
select the number of "Replies", "Mails" and "Comments" as 
three features, we also add other features such as keywords, 
topics of weibos as aids to further improve the accuracy, we 
use exponent increase function to simulate the relationship 
between the selected features and "Forward" probabilities, 
which could be seen as below: 



P[y\ntype{Xl,X2)) 



1 + e'^^yp^ 



(3) 



Oij means the number of times uj forwards Ui^s weibo (uj 



y represents whether user Xi forwards X2 — 1 means to 
forward and 2/ = means to not forward; type means differ- 
ent actions, which include "Reply", "Mail" and "Comment", 
ritype means the number of times Xi "Replies" or "Mails" or 




Figure 13: The Relationship between "Follows" and "Forwards" 




Table 3: Contribution of different indexes for "For- 
warding" prediction 



Item 


All 


No Reply 


No Mail 


No Comment 


Accuracy 


0.6902 


0.6746 


0.6702 


0.6903 


Recall 


0.9605 


0.8958 


0.8884 


0.9605 


Fl-Score 


0.8088 


0.7883 


0.7875 


0.8089 



"Comments" on X2. P{y\ntype{Xi, X2)) is the probability 
distributions of Xi to forward X2 based on the features of 
ntype{Xi, X2). Experiments data are collected from all ac- 
tions of 2,000 high level active users, and the results can be 
seen in Table 3 

Experiment shows that "Reply" and "Mail" provide pos- 
itive contributions towards the performance, while "Com- 
ment" provides negative contributions towards the perfor- 
mance. The result is consistent with our analysis; it also 
illustrates that statistical analysis for Tencent weibo, could 
help to detect the inner relations of users' influence and is 
a good guiding significance for system design and optimiza- 
tion. 



dia center, the network of Tencent is more complex than 
Twitter, and the users are more active than Twitter; Be- 
sides, the topics, which are talked in Tencent are very dif- 
ferent with that of Twitter, Tencent users like topics re- 
lated with entertainment, joke and fashion, they also like 
to keep communicating on a certain number of old topics, 
which were marked by other users. The unique features of 
Tencent users' behaviors are mainly determined by the cul- 
ture background in China. From Micro level, we mainly 
investigate users' social influence from two aspects: "Fol- 
low" and "Forward", we first analyze the statistical features 
of "Follow" and "Forward", then we research on the rela- 
tions between "Follow", "Forward" and other users' actions, 
we find that "Follow" and "Forward", "Follow" and users' ac- 
tions have positive relations under a certain extent; some 
of users actions (Reply, Mail) have significant positive rela- 
tions with "Forward" probabilities, while "Comment" do not 
have that relation. At last, a predict model is proposed to 
verify the application value of our analysis. Experiment re- 
sults show that the analysis could help to better understand 
micro-bloggs and provide good guiding significances for sys- 
tem design and optimizations. 



7. CONCLUSION 

The paper proposes a systematic study on Tencent Weibo, 
one of the largest Micro-Bloggs in China, from both macro 
and micro level, and compare the results with Twitter. From 
Macro level, Tencent Weibo is more like a personalized me- 
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